Mosul
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed
Republican Governor rips Trump for'MURDER' in Minneapolis as GOP erupts at ICE scandal Seven dead in private jet crash as audio reveals voice said'Let there be light' seconds before tragedy at snowy Maine airport Is Angelina Jolie quitting America? Private struggles emerge... as actress weighs major lifestyle that threatens to rupture her family Inside the secret double life of a beloved neurosurgeon whose gay love triangle ended... in an execution at his $2.5M mansion Queer Eye snitch reveals exactly what was said about Karamo Brown in a hot mic moment... that's torn the cast apart Kate Hudson's Oscar nomination torched as an'abomination' amid toxic family feud over Song Sung Blue Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed America's best and worst states to retire revealed - and why Florida is no longer the obvious winner Prince Harry and Meghan Markle's Sundance screening sparks online row: 'Sussex Squad' brand claims event failed to sell out as'lies' despite photos showing'rows of empty seats' Kristi Noem's VERY unfortunate post shortly before Trump sent Tom Homan to Minneapolis to clean up mess after she lied about protester shot dead by her DHS officers NFL's'scripted' conspiracy theory resurfaces as fans find five-month old post hinting at Super Bowl 60 matchup Forensic video analysis of Alex Pretti's final 30 seconds exposes'John Wayne gun' question that can't be ignored Victoria and David Beckham make first public appearance together since son Brooklyn's damning statement as children Cruz, Romeo and Harper turn up to support her as she becomes a Knight of the Order of Arts and Letters Kristi Noem is dealt hammer blow live on Fox News as Trump lawyer trashes claim Minneapolis victim Alex Pretti was'domestic terrorist' Lauren Sanchez turns heads in a red skirt suit as she holds hands with billionaire husband Jeff Bezos at Schiaparelli's Paris Haute Couture Fashion Week show Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed READ MORE: I interviewed Joe Rogan's'worst guest ever'... then a controversial question stopped everything Joe Rogan's latest podcast guest delved into controversial scans showing an enormous underground structure beneath the Great Pyramid of Giza, potentially rewriting ancient history. The scans were conducted by Italian scientist Filippo Biondi and the Khafre Project team using synthetic aperture radar. More than 200 scans from multiple satellites, including Italy's Cosmo-SkyMed and the US-based Capella Space, showed uniform results suggesting massive pillars about 65 feet in diameter wrapped in spirals and plunging nearly 4,000 feet deep. Those pillars appear to end in 260-foot cubic chambers beneath all three pyramids and the Sphinx, which Biondi described as'huge chambers' measuring roughly 260 feet in length and width.
- Africa > Middle East > Egypt > Giza Governorate > Giza (1.00)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.65)
- Europe > Italy (0.25)
- (18 more...)
- Transportation > Air (1.00)
- Media > Television (1.00)
- Media > Music (1.00)
- (8 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Mobile (0.66)
DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models
Altakrori, Malik H., Habash, Nizar, Freihat, Abdelhakim, Samih, Younes, Chirkunov, Kirill, AbuOdeh, Muhammed, Florian, Radu, Lynn, Teresa, Nakov, Preslav, Aji, Alham Fikri
We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects. While recently developed Arabic and multilingual benchmarks have advanced LLM evaluation for Modern Standard Arabic (MSA), dialectal varieties remain underrepresented despite their prevalence in everyday communication. DialectalArabicMMLU extends the MMLU-Redux framework through manual translation and adaptation of 3K multiple-choice question-answer pairs into five major dialects (Syrian, Egyptian, Emirati, Saudi, and Moroccan), yielding a total of 15K QA pairs across 32 academic and professional domains (22K QA pairs when also including English and MSA). The benchmark enables systematic assessment of LLM reasoning and comprehension beyond MSA, supporting both task-based and linguistic analysis. We evaluate 19 open-weight Arabic and multilingual LLMs (1B-13B parameters) and report substantial performance variation across dialects, revealing persistent gaps in dialectal generalization. DialectalArabicMMLU provides the first unified, human-curated resource for measuring dialectal understanding in Arabic, thus promoting more inclusive evaluation and future model development.
- Asia > Middle East > Qatar (0.28)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Saudi Arabia (0.14)
- (25 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
FARSIQA: Faithful and Advanced RAG System for Islamic Question Answering
Asl, Mohammad Aghajani, Bidgoli, Behrooz Minaei
The advent of Large Language Models (LLMs) has revolutionized Natural Language Processing, yet their application in high-stakes, specialized domains like religious question answering is hindered by challenges like hallucination and unfaithfulness to authoritative sources. This issue is particularly critical for the Persian-speaking Muslim community, where accuracy and trustworthiness are paramount. Existing Retrieval-Augmented Generation (RAG) systems, relying on simplistic single-pass pipelines, fall short on complex, multi-hop queries requiring multi-step reasoning and evidence aggregation. To address this gap, we introduce FARSIQA, a novel, end-to-end system for Faithful Advanced Question Answering in the Persian Islamic domain. FARSIQA is built upon our innovative FAIR-RAG architecture: a Faithful, Adaptive, Iterative Refinement framework for RAG. FAIR-RAG employs a dynamic, self-correcting process: it adaptively decomposes complex queries, assesses evidence sufficiency, and enters an iterative loop to generate sub-queries, progressively filling information gaps. Operating on a curated knowledge base of over one million authoritative Islamic documents, FARSIQA demonstrates superior performance. Rigorous evaluation on the challenging IslamicPCQA benchmark shows state-of-the-art performance: the system achieves a remarkable 97.0% in Negative Rejection - a 40-point improvement over baselines - and a high Answer Correctness score of 74.3%. Our work establishes a new standard for Persian Islamic QA and validates that our iterative, adaptive architecture is crucial for building faithful, reliable AI systems in sensitive domains.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Asia > Middle East > Syria > Damascus Governorate > Damascus (0.04)
- Asia > Middle East > Iraq > Nineveh Governorate > Mosul (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Equivalent Linear Mappings of Large Language Models
Despite significant progress in transformer interpretability, an understanding of the computational mechanisms of large language models (LLMs) remains a fundamental challenge. Many approaches interpret a network's hidden representations but remain agnostic about how those representations are generated. We address this by mapping LLM inference for a given input sequence to an equivalent and interpretable linear system which reconstructs the predicted output embedding with relative error below $10^{-13}$ at double floating-point precision, requiring no additional model training. We exploit a property of transformers wherein every operation (gated activations, attention, and normalization) can be expressed as $A(x) \cdot x$, where $A(x)$ represents an input-dependent linear transform and $x$ preserves the linear pathway. To expose this linear structure, we strategically detach components of the gradient computation with respect to an input sequence, freezing the $A(x)$ terms at their values computed during inference, such that the Jacobian yields an equivalent linear mapping. This detached Jacobian of the model reconstructs the output with one linear operator per input token, which is shown for Qwen 3, Gemma 3 and Llama 3, up to Qwen 3 14B. These linear representations demonstrate that LLMs operate in extremely low-dimensional subspaces where the singular vectors can be decoded to interpretable semantic concepts. The computation for each intermediate output also has a linear equivalent, and we examine how the linear representations of individual layers and their attention and multilayer perceptron modules build predictions, and use these as steering operators to insert semantic concepts into unrelated text. Despite their global nonlinearity, LLMs can be interpreted through equivalent linear representations that reveal low-dimensional semantic structures in the next-token prediction process.
- Asia > Middle East > Kuwait (0.05)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.05)
- Asia > Middle East > Jordan (0.04)
- (18 more...)
Does Local News Stay Local?: Online Content Shifts in Sinclair-Acquired Stations
Wanner, Miriam, Hager, Sophia, Field, Anjalie
Local news stations are often considered to be reliable sources of non-politicized information, particularly local concerns that residents care about. Because these stations are trusted news sources, viewers are particularly susceptible to the information they report. The Sinclair Broadcast group is a broadcasting company that has acquired many local news stations in the last decade. We investigate the effects of local news stations being acquired by Sinclair: how does coverage change? We use computational methods to investigate changes in internet content put out by local news stations before and after being acquired by Sinclair and in comparison to national news outlets. We find that there is clear evidence that local news stations report more frequently on national news at the expense of local topics, and that their coverage of polarizing national topics increases.
- North America > United States > Montana > Missoula County > Missoula (0.28)
- North America > United States > Rhode Island > Providence County > Providence (0.28)
- Asia > Middle East > Israel (0.14)
- (46 more...)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.92)
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
Semantic-Aware Edge Intelligence for UAV Handover in 6G Networks
Al-Hameed, Aubida A., Qazzaz, Mohammed M. H., Hafeez, Maryam, Zaidi, Syed A.
6G wireless networks aim to exploit semantic awareness to optimize radio resources. By optimizing the transmission through the lens of the desired goal, the energy consumption of transmissions can also be reduced, and the latency can be improved. To that end, this paper investigates a paradigm in which the capabilities of generative AI (GenAI) on the edge are harnessed for network optimization. In particular, we investigate an Unmanned Aerial Vehicle (UAV) handover framework that takes advantage of GenAI and semantic communication to maintain reliable connectivity. To that end, we propose a framework in which a lightweight MobileBERT language model, fine-tuned using Low-Rank Adaptation (LoRA), is deployed on the UAV. This model processes multi-attribute flight and radio measurements and performs multi-label classification to determine appropriate handover action. Concurrently, the model identifies an appropriate set of contextual "Reason Tags" that elucidate the decision's rationale. Our model, evaluated on a rule-based synthetic dataset of UAV handover scenarios, demonstrates the model's high efficacy in learning these rules, achieving high accuracy in predicting the primary handover decision. The model also shows strong performance in identifying supporting reasons, with an F1 micro-score of approximately 0.9 for reason tags.
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Iraq > Nineveh Governorate > Mosul (0.04)
- Research Report (1.00)
- Overview (1.00)
Optimizing Hyper parameters in CNN for Soil Classification using PSO and Whale Optimization Algorithm
Ibrahim, Yasir Nooruldeen, Ramo, Fawziya Mahmood, Qadir, Mahmood Siddeeq, Al-Shamdeen, Muna Jaffer
Classifying soil images contributes to better land management, increased agricultural output, and practical solutions for environmental issues. The development of various disciplines, particularly agriculture, civil engineering, and natural resource management, is aided by understanding of soil quality since it helps with risk reduction, performance improvement, and sound decision-making . Artificial intelligence has recently been used in a number of different fields. In this study, an intelligent model was constructed using Convolutional Neural Networks to classify soil kinds, and machine learning algorithms were used to enhance the performance of soil classification . To achieve better implementation and performance of the Convolutional Neural Networks algorithm and obtain valuable results for the process of classifying soil type images, swarm algorithms were employed to obtain the best performance by choosing Hyper parameters for the Convolutional Neural Networks network using the Whale optimization algorithm and the Particle swarm optimization algorithm, and comparing the results of using the two algorithms in the process of multiple classification of soil types. The Accuracy and F1 measures were adopted to test the system, and the results of the proposed work were efficient result
- Asia > Middle East > Iraq > Nineveh Governorate > Mosul (0.06)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Er, Yakup Abrek, Kesen, Ilker, Şahin, Gözde Gül, Erdem, Aykut
We introduce Cetvel, a comprehensive benchmark designed to evaluate large language models (LLMs) in Turkish. Existing Turkish benchmarks often lack either task diversity or culturally relevant content, or both. Cetvel addresses these gaps by combining a broad range of both discriminative and generative tasks ensuring content that reflects the linguistic and cultural richness of Turkish language. Cetvel covers 23 tasks grouped into seven categories, including tasks such as grammatical error correction, machine translation, and question answering rooted in Turkish history and idiomatic language. We evaluate 33 open-weight LLMs (up to 70B parameters) covering different model families and instruction paradigms. Our experiments reveal that Turkish-centric instruction-tuned models generally underperform relative to multilingual or general-purpose models (e.g. Llama 3 and Mistral), despite being tailored for the language. Moreover, we show that tasks such as grammatical error correction and extractive question answering are particularly discriminative in differentiating model capabilities. Cetvel offers a comprehensive and culturally grounded evaluation suite for advancing the development and assessment of LLMs in Turkish.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.05)
- Antarctica (0.04)
- (22 more...)
- Education (0.93)
- Health & Medicine > Therapeutic Area (0.93)